

Proc. of Int. Conf. on Current Trends in Eng., Science and Technology, ICCTEST

# Design of 32 bit Low Power ALU Using Cadence

Mrs.Mahadevi S.Manur Assistant Proffesor Don Bosco Institute of Technology/Department of ECE, Bangalore, INDIA Email: madhu.dbit@gmail.com

*Abstract*— Arithmetic logic unit is used in performing the arithmetic and logical operations of any system. Increasing the speed and low power consumption are the important attributes for any digital circuit. The speed of different modules used in the design will determine the overall performance of the system. For the ALU design, the most important part is the adder. Carry look-ahead adder is fast, but the area of the layout and fan-out on some of the signals increases as the number of bits grows, slowing down the adder. Ripple carry adder has a longer delay with minimal layout area.

Index Terms— ALU, BGFSB, Cadence, Adders, Power Estimation, Delay Estimation.

### I. INTRODUCTION

The main aim of this paper is to design a 32 bit ALU with increased performance, i.e in terms of decreased delay and less power dissipation. In order to obtain the above results, we use a methodology called as Back gate forward substrate method. According to this methodology, a forward bias is applied to the gate of the conventional transistors, wherein the threshold voltage of the devices decreases and they start conducting before attain the threshold voltage specified. Thus increasing the speed of the devices. In order to design a arithmetic and logic unit, each and every blocks and components of each block will be optimized. The paper starts with the comparison of different adders and after narrowing down from the literature survey, four adders mainly, 10T, 14T, CPL and Hybrid adders will be implemented with and without BGFSB and power and delay models will be derived.

## A. ADDERS

i) 10T Adder : The 10T single bit full adder is implemented with three inputs A,B and C and produces two outputs sum and cout. They benefit from small transistor count and exploit the non-full swing pass transistors with swing restored transmission gate techniques. The problem that produces high capacitance values for the inputs is less clear in these designs.



Figure 1: 10T Adder

*Grenze ID: 02.ICCTEST.2017.1.178* © *Grenze Scientific Society, 2017*   14 T Adder: It consists of inverter, four pass transistor logic ex-or gates and the transmission gates based multiplexers for the sum and carry out signals. The ex-or gates is inverted to get ex-nor. Both exor and ex-nor are used simultaneously to generate sum and carry out. The 14T is the faster adder when compared to conventional adder.



 CPL Adder: The complementary Pass transistor logic adder comprising of 32 transistors is shown below. The main disadvantage of CPL is the number of transistor count increases and the delay increases .But they do not have inherent threshold voltage drop problem as in the case of simple pass transistor logic adders.



Figure 3: Complementary Pass Logic Adder

iii) Hybrid Adder: The hybrid full adder consists of three modules. First module is implemented for ex-or and ex-nor functions. The second module allows a inverter to be connected to the last stage. The third and the last module is the multiplexer implemented in cmos logic style



Figure 4: Hybrid Adder

The above adders i.e 10T adder, 14T adder, Complementary Pass Transistor logic adder and Hybrid adders were implemented in the cadence virtuso and the performance estimation was carried out in terms of Power and delay. The Table I below gives the power comparison and the Table II gives the delay comparison.

| Adder        | Without BGFSB power(W) | With BGFSB power (w) |
|--------------|------------------------|----------------------|
| 10T Adder    | 1.12e-6                | 6.77e-6              |
| 14T Adder    | 3.535e-7               | 1.965e-6             |
| CPL Adder    | 8.85e-7                | 7.3e-6               |
| Hybrid Adder | 5e-6                   | 5.8e-6               |

TABLE I: ADDERS POWER COMPARISION

| Adder        | Without BGFSB Delay(s) | With BGFSB Delay(s) |
|--------------|------------------------|---------------------|
| 10T Adder    | 1.55e-10               | 1.09e-10            |
| 14T Adder    | 5.818e-11              | 2.3e-12             |
| CPL Adder    | 1.32e-10               | 1.02e-10            |
| Hybrid Adder | 6.5e-10                | 5.4e-10             |

**II. PROBLEM DEFINITION** 

- The conventional design of ALU was not compromised with Area, Power and Delay
- The use of conventional design in low power application was not so good and suitable
- Hence the main aim is to reduce the power consumption in the design and to increase the speed of operation of the ALU to work at faster rate.

The solution was only to design the Functional Blocks in different logic styles and to enhance the Performance.

III. PROPOSED WORK

- A method called BGFSB, i.e back gate forward substrate bias is implanted on the existing conventional transistors.
- According to BGFSB, a forward bias is applied to the back gate or bulk substrate with respect to the source of the transistor when the transistor is working in its dynamic active mode.
- A forward bias voltage for both PMOS and NMOS is applied.
- The methodology of applying BGFSB at low power source voltage will improve the circuit delay significantly and gives an improvement factor of about 1.5-2.



Figure 5 : BGFSB in Dynamic active mode

The below figure illustrates the in depth functional diagram of a 1-Bit Arithmetic and logic unit.



Figure 6: Block Diagram of 1-Bit ALU

The above functional block diagram performs four arithmetic and four logical operations. The arithmetic operations of the designed ALU includes ADD, SUBTRACT, INCREMENT and DECREMENT and the logical operations implemented includes AND, OR, EXOR and EXNOR.

Each stage of the ALU is composed of the following three components:

- 4:1 and 2:1 multiplexer for inputs
- A full adder component
- 4:1 and 2:1 multiplexer for outputs.

There is an input and output 4:1 multiplexers. The input variables are given to the 4:1 multiplexer and the full adder correspondingly. For the logical operations the inputs are given to the gates and then to the second 4: 1 multiplexer.

The multiplexers have two select inputs S0 and S1. The 4-bit ALU uses a 4-to-1 MUX designed in CMOS pass transistor logic for low power. The 4:1 input Multiplexer selects the four input conditions, i.e of Logic 1,Logic 0,Input variable B and its complement for the full adder circuit depending on the arithmetic operations to be performed on the input side and also to pass the output of the full adder to the output pin, at the output side. The block diagram shows a 4-to-1 MUX where a select pin S2 is connected to the 2:1 multiplexer.

When the pin S2 is at logically zero, the arithmetic operations are performed, and while for S2 at logically one, the four logical operations are performed.

S2p is the complement of S2 bit used in the input stage. For S2 bit equal to 1 or S2p bit equal to zero, we get logic function at the output. Since the logical operations are performed using the basic logic gates, the delay for each logic operation would be the delay through the gate.

However, the arithmetic operations make use of the complete adder. Increment and decrement operations are just the different cases of addition and subtraction. Increment operation is an addition by 1 and subtraction is 2's complement addition. The complexity of the arithmetic operations is very much more when compared to the logical operation as it depends not only on what type of logic is used to implement the SUM and CARRY units of the full adder, but also on the incomingt pattern of bits and the critical paths in the circuit. The Optimization of design of the full adder optimizes all operations to some extent. The Boolean

expressions for the SUM and CARRY are described as follows:

#### $SUM = A \oplus B \oplus C_{IN}$

## $CARRY = AB + BC_{IN} + C_{IN}A$

Where A and B are two inputs and CIN is the CARRY input to the full adder. As shown in Table I, a particular operation of the ALU is performed based on the three select signals (S0, S1 and S2), thus allowing one of the eight operations to be performed. S0 is the LSB and S2 is the MSB.

For the logical operations, each bit output is obtained in parallel, as the operations of each bit are independent of the other. For all arithmetic operations each successive stage depends on the previous stage for the CARRY bit. After full adder performs the necessary operation, the output multiplexer selects the correct output.

The value of signal S2 decides whether it is a logical or arithmetic operation. Figure shows the topology of a 4-bit ripple carry adder. The carry propogates from one stage to the other. For some input patterns no rippling occurs, while for some others, rippling occurs all the way from LSB to the MSB position.

| S2(MSB) | S1 | SO(LSB) | Operation |
|---------|----|---------|-----------|
| 0       | 0  | 0       | INCREMENT |
| 0       | 0  | 1       | DECREMENT |
| 0       | 1  | 0       | ADD       |
| 0       | 1  | 1       | SUBTRACT  |
| 1       | 0  | 0       | AND       |
| 1       | 0  | 1       | OR        |
| 1       | 1  | 0       | EX-OR     |
| 1       | 1  | 1       | EX-NOR    |

TABLE III : TRUTH TABLE OF A 4-BIT ALU

The propagation delay for such a structure, also called the critical path, is defined as the worst case delay over all input patterns [3]. The delay is proportional to the number of bits in the input words N and is given by

#### $T_{ADDER} = (N-1) t_{CARRY} + t_{SUM}$



Figure 7: 4 Bit Ripple Carry Adder

where tCARRY and tSUM are propagation delays from one stage to another.

Following two important conclusions are drawn from Eq. (6). (1) The propagation delay of the RCA is linearly proportional to N, the number of bits. This property becomes increasingly important when designing adders for the wide data-paths. (2) When designing the full adder cell for a fast ripple carry adder, it is far more important to optimize tCARRY than tSUM, since the latter (tSUM) only has a minor influence on the total value of T ADDER. Worst case delay calculations for all arithmetic operations are performed using Eq. (6).

For logical operations, as each bit will have the same delay, it is sufficient to measure the delay through one stage of the ALU. The ALU was designed in 1.2 mm, n-well SCMOS (scalable CMOS) technology. This technology uses two levels of polysilicon and two levels of metal for interconnection. The polysilicon at the level one is used for the gate and as well as for interconnection. The technology is used both for the design of analog and digital circuits.

The minimum size MOSFET has 1.8 mm channel width and 1.2 mm channel length, respectively. Figure 6 shows the layout design of the 4-bit ALU. All PMOS transistors have the W/L size of 3.6/1.2 and NMOS transistors have the W/L size of 1.8/1.2. In the design, provision has been made to apply back-gate forward substrate bias to all the transistors externally.

Independent bonding pads have been assigned for VBN and VBP to the p substrate and n-wells as shown in Fig. 6. Thus, the design could be tested with and without BGFSB. However, in standard CMOS design, p-substrate and n-well are directly connected to VSS (GND) and VDD pads, respectively.

# IV. RESULTS

The below figures give the simulation results for 1 bit ALU with and without BGFSB methods,4 bit ALU, 16 bit ALU and 32 bit ALU with BGFSB methods.



Fig 1.Arithmetic and logical unit functional block for 1 bit



Fig 3. Arithmetic and logic unit with Back Gate Forward Substrate Forward Substrate Bias for 4 bit



Fig 2.Arithmetic and logical unit with Back Gate Substrate Bias functional block of bit



Fig 4. 16 bit Arithmetic logic unit with Back Gate Back Gate Forward Substrate Bias



Fig 5. 32 bit Arithmetic logic unit with Back Gate Forward Substrate Bias

V. POWER AND DELAY ESTIMATION OF 4,8,16 AND 32 BIT ALU

The below tables illustrates the delay and power estimation of 4,8,16 and 32 bit ALU with and without BGFSB methods.

| ALU DESIGN | WITHOUT BGFSB DELAY(Sec) | WITH BGFSB DELAY(Sec) |
|------------|--------------------------|-----------------------|
| 4 Bit ALU  | 5.36e-12                 | 2.35e-12              |
| 8 Bit ALU  | 6.48e-12                 | 2.46e-12              |
| 16 Bit ALU | 6.8e-12                  | 3.2e-12               |
| 32 Bit ALU | 7.04e-12                 | 4.2e-12               |

TABLE IV : DELAY ESTIMATION OF ALU

TABLE V: POWER ESTIMATION OF ALU

| ALU DESIGN | Without BGFSB Power(W) | With BGFSB Power (w) |
|------------|------------------------|----------------------|
| 4 Bit ALU  | 8.094e-10              | 2.45e-4              |
| 8 Bit ALU  | 12.45e-10              | 4.4e-4               |
| 16 Bit ALU | 16.56e-10              | 6.5e-4               |
| 32 Bit ALU | 20e-10                 | 8.06e-4              |

From the above estimations, we can derive that there is almost a 50% reduction in the delay of a 32 bit ALU with BGFSB method when compared to that of the conventional transistors. The delay is reducing from 7.04 e-12 to 4.2e-12. The power dissipation has come down to 8.06e-4 from that of 20e-10 with the application of BGFSB.

# VI. TOOLS UTILIZATION

- Cadence Tools with .18µm Technology using Virtuoso Schematic Editor for all Schematic and layout design.
- Analog Design Environment for Verification and Analysis using Spectra Tool.
- Assura for RC extractions.

#### VII. CONCLUSION

- To reduce area, ripple carry adder is used in ALU.
- Power consumption decreases as less number of active devices are used.
- By reducing the area and by using the multiplexers for Arithmetic and logic Unit, there is a significant power optimization.
- The designed BGFSB method is having a delay of 55% less compared to normal design architecture

#### REFERENCES

- Y. Jiang, A. Al-Sheraidah, Y. Wang, E. Sha, and J. Chung, "A novel multiplexer Basedlowpower full adder cell," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 7, pp. 345–348, Jul. 2004.
- [2] N. Weste and K. Eshragian, Principles of CMOS VLSI Design: A Systems Perspective, 2nd ed. Boston, MA: Addison Wesley, 1993.
- [3] S. Goel, A. Kumar, and M. Bayoumi, "Design of robust, energy-efficient full adders for deep sub-micrometer design using hybrid-CMOS logic style," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 12, pp. 1309–1321, Dec. 2006.
- [4] K. Navi, O. Kavehie, M. Rouhulamini, A. Sahafi, and S. Mehrabi, "A novel CMOS full adder," in Proc. 20th Int. Conf. VLSI Design, 2007, pp. 303–307.
- [5] W. R. Rafati, S. M. Fakhraie, and K. C. Smith, "Low-power data-driven dynamic logic (D3L)," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2000, pp. 752–755.
- [6] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri, and P. Corsonello, "Low power split-path data driven dynamic logic," IET Circuits, Devices, Syst., vol. 3, no. 6, pp. 303–312,2009.
- [7] S. Purohit, M. Lanuzza, and M. Margala, "Design space exploration of spi-path data driven dynamic full adder," J. Low Power Electron., vol. 6, no. 4, pp. 469–481, Dec. 2010.
- [8] A Fast ALU Design in CMOS for Low Voltage Operation A. SRIVASTAVA\* and D. GOVINDARAJAN.Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803-5901, USA